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ABSTRACT \ 

v. . This paper has two aims, each bearing on recent 
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Education, concerning evaluation policy and practices at the * 
national, .state, and local levels of government;. The second aim is to 
link same of the United States recommendations to idea's presented at 
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related papers by the seminar participants* The intent* is ° (1). to 
outline similarities, differences, and analogs between the two 
perspectives in order to learn how the Israeli experience" can be 
adapted by the United State*, and vice versa; and (2) to famine how 
the problems evidenced in the arena of evaluation are 'not confined by 
national borders, ethnic origin,' or history. (PN) ^ 
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ON EVALUATION POLICY IN THE UNITED STATES AND ISRAEL 1 

° . . " v * - *• i 

Rob«rt v E. Boruch % *" 

■ • . *> * 

\ \ u . ' 

1. INTRODUCTION 

Attempting to'understand which of several policies or.pfBgrams has the 
greater benefit 1s not,' of course, a novel human enterprise. Comparative 
tests to understand conditions undenwhich ch'ilderen learn speech, for 
Instance, were undertaken by the Arab conqueror Akbar the Great in 14th 
century India. Competing theories of human development fired rabbinftr^ 
argument and theories of -evidence during the same period 1n the Middle East. 
Nor are sophisticated logic antfstatlstical- theories underlying; the fair 
comparison of prognms especially new. They are represented 1n the 18th 
century scholars' attempts 1n Europe to understand numerical evidence and 
Independent contributions during the same period to characterlzethe toxicity 
of metals, chemicals, ^nd drugs. Finally, there are some distinctive early 
precedents for controlled field tests of social programs. TheylncjMde 
ex perl mention the effects of sanitation Instruction 1n Syria during 1933-33, 
and on the comparative benefits of raw and pasteurized ml 1 k 1n nutrition 
programs for Eng^lsl) ~sch^oT^ch~11dren"in"l930 2 T : -~ 

What Js^ relative!^ novel about evaluation Is. the regularity of formal ^ 
government interest In understanding the comparative .effects of new social* 
programs and Increased government willingness to estimate effects In pllot^ 
tests of the programs. 1 This Interest In effectiveness Is- 1 1nked In principle 
to systematically establishing the need for programs and the quality of their 
Implementation.* The latter are no las$ important than estimating Effects,, 
but, unt<1 recently, had not^been routinely required by law. / - 

Thfs paper has two alms, each bear1na,on recent developments Iff* evaluation 
policy^. The first is to summarize a report that we presented 1n 1980 to the 
U.S. Congress and Department of Education, concerning evaluation policy and 
practlt^s afthe national, state and local levels of governmeht, The focus 
here 1s on reconmeridat1o(p and the treatment 1s very brief.* The report* 
itse*lf (Boruch and Cordray, 1980) prov1des»deta1ls, Is reajdlfy accessible, 
and th$ literature review on which it U based has been published. elsewhere 
(Boruch and Wortman, 1979). 
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« .The second aim 'is to link some of the U.S. recommendations to Ideas pre- 
sented at the Israel-U.S. seminar on education evaluations'** and in related 
papers by the seminar participants. This examination, too, 1s brief, rather 
too brief to do r<al justice to the Ideas proppsed. But the Intention 1s 
simply to outline similarities, differences, and analogs between, the two ^ 
perspectives. One of my motives 1s pragmatic: 'to learn how the Israeli 

xperience^cJn be adapted by the United State|, and vice versa. The.second 
nx)tive is bAsed on the simple premise, suggested earlier, that problems of 
evidence in this arena are not confined by national borders, ethnic origin, 
or history. -Understanding how durable problems and their' solutions arijs 
a tfskthat, as -a theoretician, I c^n ill afford to ignore. * ' 

2. Illf HOI I /MAN. REPORT TO THE CONGRESS 

rheM ducat ion Amendments of 1978 required that the Secretary«of the U.S. v 
Department of Education conduct a comprehensive review of federal evaluatvcm 
practices and procedures. Introduced as a bill by Congressman Elizabeth, 
flol tzman,.the law directs attention to federal ly supported programs at the 
national, state, and local levels of -government.. • In response, two projects 
were initiated by Department and Cohgressiopal staff. A group at Northwestern 
University was asked to undertake the first 1n September 1980. The National 
Academy of Sciences', Committee on Program Evaluation was asked tojiiltlate 

-^ara-He^— i^dependeivt-wor-k-, Ihe-resu.1ts.o_f each are reported 1n Boruch ^nci 

Cordray (1980) and Rai2en and Rossi (1981) respectively 3 . 

The questions covered in- Northwestern 1 s report to the Congress and the 
Department of Education are fundamental. They were Implied by the law and 
the conference reports preceding 1t: 

. Why and how are' evaluations carried out? 6 
. What are the capabilities of those who carry out evaluations? 
. How are the results of evaluation used? ,- • 
. . What recommendations can be Tnade to improve procedure or. practice? 
The study was prospective in Its orientation, designed to provide evidence 
s and argument bearing on^ these questions and to provide ^recommendations which 
would help to ameliorate the pr6blemi» that were identified. Th^indings 
and recommendations 0 stenmed from two general sources of Information: contem- 
porary investigations by other researchers and agencies* and direct investi- 
gation£ by Project staff/ The latter Included site visits to local and 
state education agencies and telephone surveys Id f local units, botfi based on 
a stratified random sample. Round-table discussions were undertaken to 
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be,,. , 

capitalize orrspeqial expertise In topics such as school board Interest In 
evaluation. Interviews with some staffers of all major federal agencies with 
an interest 1n educatlor^l evaluation were carried ou(. This. Included the 
U.S. General Accounting pfflce and the Congressional Budget'Offlce as'well as. 
the education agencies. The literature review covered both unpublished and 
published documents. Including reports maintained by ERIC and, fn\he case 
Of statute, by the LE1CIS system." An earlier* revMeW served a£ a guide to . 
sources onnational studies published before- 1979. c # ■ 

The project repoH 1 ma^Je recommendations to the Congress and to the U.S. 
Department of Education. ' The £wo sets are cor#ensed in- the following treat-, 
ment and coupled to a brief rationale for. each. The links to, Israeli work , 
are discussed after each recommendation. 

3. REQUESTII^ AND^PLANNING EVALUATIONS 4*'/* ' / 

Three of the Report's recorrmendatlbns concern the process of deciding* 0 ' 
'what kind of evaluations can or should be don e^ yid^ they way" they should he 
-done. They stress the necessity for regular meetings to establlshjnforma- * 
tlon needs, the merit of specificity In evaluation law, and the reduction^ 
of constraints ori exchange of Information. * , „ r * 



On clarifying needo* audienSea and options ' * 

He reconmended that the Congress direct the relevant staff oT Congressional 
ronmlttees and support units such as the l/.S. General Accounting Office and 
the Congressional Budget Office to meet with evaluation staff ^of the Depart- 
ment of Education regularly with Instructions to: 1) Identify specific 
cormrlttee* and groups as audiences for evaluation^ results, 2) reach igreemfent 
about when particular evaluations are warranted and the extent, tb which each 
evaluatVn required by law 1s possible, 3) clarify Congressional Information 
needs, the quality and type of evidence required, and a planning cycle for 
*ach major evaluation required" by law, and 4) Identify. the changes In programs 
or understanding which could occur on the basis of alternative findings. * 
Parallel suggestions were also made to the Department of Education. 

The recommendation is, at Its simplest* embarrassingly mundanefS^t asks 
that the principals meet* And Indeed periodic efforts have-\bcen ntfdeAy 
Congressional and Department staff to assure that the production of reports 
coincides with authorization cycles and that Congressional needs are under- 
stood,, But the process has been less orderly* less regular, 'and less thorough 
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than it ought 'to be. This recommendation* depends h^ayil^'on tfl! facX that 
a legislative demand for "evaluation" Ms -of ten amblgiraus. . The wgrd can • , 
.imply in.y activity ff%m journalistic reporting to f)*ll-blown long-term f I^Td 
•experiments ^edlcated to estimating the effects of 1nndva^t4on on^Jii^dr^n. 4 
Th$ invol vement^Tf multiple Interest groups, is oft^l necessary. But *th1s 
cempl.icates matter Sdr^her sincf:'aTl are unlikely to agree on just wHat sort 
of evaluation is warranted. At wor^', gjengraJ legal demands to evaluate that 
are unaccompianre'd Syj^e^ious diScussion-^scure the fact that the feasibility 
of particular kinds of evaluation varies enormously and^that elaborate evalua-* 
t}.onrmay be j^nwarrapted. ^\ L * L ■ . . 

«, - ^ \ . v. • ( 
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We hfccosrmended ttfiat *n construct! rta laws for evaluation, fhe Cor|tjress: 
. l) specify exactly which questions ougot to be addressed/and the audfences 
to whom ^sults should be addressed," when\sp^fcifi cation 1s p^sslMo 1 ; 1 2) pro- \ 
vide fo?c!formaJ -assessment of the etoluablr^ty'of the relevant *>pro§ram where 
'specification of questions is" not possible; 3) provide foi* Vtatt|t1caUy * 
validvfjeld testing of proposed^evaluat1(Jn requirements where specification 
is not possible and 1rv-hoy$& assessment Insufficient. 

' >,wV A" v. ' . ' I 

Jhough statutes' are 'frequently explicit about routine reporting requirfe- . f ' 

ments; references to ev*aluat1on^re^o^ten ambtguousV The £ onmon requirement. 

*ior-1nstahce t to evil bate * whether the program meets\the objectives of the -statute* 

, , • * ■ 1 ■ t ■ v — ' 

is tonmoii.'but vague; . The publjsfied Hearings , covering, pib Vic testimony 

submitted pr\tor tbenactment^Of a law^ .are* not^always Informative. 

n?ng evaluation requirements terms of the questions that should be 
addressed 1s s r ensible s «o / -lii^9 as thfr questions themselves are clear, answering 
them 1s*feas1ble, and QiWunsweFS are Hk'^ly to be useful L Tfie particular 
questions that often t^4' to be addressed are; flow, many* are serwnJyand how * 
many need service? WhaJ: \are the services anfl their costs? What ffre the effects 
of programs on their primary or secondary clients? What are the costs $nd ' ^ 
benefits of services? The early specification of audjences, especially parti- 
cular confnlttees or Congressional support agencies, should enhance the useful- 
ness of reports. " " - ^ v ~ \- - \ 

He. Vecognliedj^hat .ffxt)Jjcitnes6 1n law is often not feasible or desirable. 
Consequently, we -suggested formal Investigation of evaluabll 1 ty (Wholey, 1977) 
to clarify questions, audiences, and the w^ys^ln which. results could be used, 
wltKln a year a^ter enactment of a (femand for'evaluation. We 0 reconmended field v 
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tests of "reporting requirements in the interest of assuring that costs and 

'benefits of reports, users and, uses of reports were well understood. 

j . * > 

v * 

Authority, fbv technical- discussion ? „ • * 

-f j. • 

The third recormiendatton in this class urged that the Department autho- 
rise the technical staff of evaluation units to initiate discussion .of eva- 
luatiorv^pj^ns with^pertinent Congressional Staff, at. their 'discretion, and 
to refraift from directives which might impede direct discussion. r 

The impetus for the recommendation was- simple: Competent evaluators can 
expect' to do a good job onJy when*they have the opportunity to frequently 
discuss Congress's Information needs. Restrictions on the evaluation urtit's^ 
-initiating discussion with the Congressional staff of conmittees that demand^ 
evaluation prevent the job from being done better. Such restrictions were 
made formal by* among others,* Joseph Callfano during his tenure as. Secretary 
of the Department of Health* Education and Welfare. The Report recognized 
thpt some restrictions on bureaucratic lobbying for programs are Warranted, 
and that some administrative rules $re necessary to keep the prOcess .of co- 
mmunication between. agencies and the Congress orderly. Restrictions engender 
a lack Vf clear opportunity. to Identify which Information Congress can use. 
This 1rr~turn decreases the likelihood that evaluations will be timely* rele- 
vant, and. credible and the likelihood that the Congress will find the results 
useful. Relax ing restrictions will not. of course* guarantee usefulness. 

Remarks » . v ' 

Three aspects of the Israeli papers are pertinent toVthes^e recommendations 

Mapping the Question . The first concerns Louis Guttnfon's mopping sentertce 
which* as lewy describes it* is a remarkably terse statement for helping one 
identify critical decision pqirits^in the evaluation: when information Is ' 
° warranted (d% what stage of the program) » what entity ought to be evaluated, 
why the Information is needed * how 1t should be obtained. This literal map 
is Implicit in our f1i$t two reconmendatlons to the Congress. Moreover* ft 
constitutes a neat working rule for the ^1nd1Vl;duals"responsible for planning 
evaluations at. the- national level . It takds no wit to see that it can be • 
formaMyajadopted as wfell in work at state and local leve?s«of government. 

Audiences for Results > The mapping sentence dedicates no explicit atten- s 
tion to the matter of whose needs must be> served by the evaluator* but both 
the Holt2man Report and the papers in this volume do so. For Instance* one 
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.of our major suggestions was to "identity audiences for resulj^as so^pn as 

possible., At its wofst, this 1s merely pioui; exhortation: the turno^r of 

staff members of the Congres|and at the. executive level 1s high enough to 

threatGr> some short-run projects and most long-term research. But 1t*is a 

practical suggestion to the extent that career bureaucrat^ and Congressional 

coimrittee staff that arg, responsible for evaluation are a stabilizing Influence 

and can serve as a vehicle for identifying both transient and durable users of 

information. - I 1 . 

* * *' • , - 

\ The idea of regular meetings among both evaluators and users of informa- 

I * 
tiori' is not different 1n principle from the tactic already used at the Israel 

i * c - 

Curriculum Center, judging from Lewy's paper.. The ICC * s use of a liaison 

person as, a bridge to users and as a expedi*tor seems sensible for information* 

exchange,' building trust and a coimion vernacular. Note that our recpnmenda- 

•tions, though, concern only "eval uators and users. Production or development 

agericies' are ignored. In principle at least, the ICC 1 laison* approach ^s. 

adaptable to working relations between these # two groups in the U.S." as well,, at 

the federal / state, and ^ocal levels. 

Developing an' Evaluation Portfolio' , »Lewy dtes a 1970 article by Alkin, 
_ — ^ 

suggesting thattha task^ of eva-1 uators' \% in no small measure, "ascertaining 
the decision areas of coqeern". Lewy expands ori this to argue that the eva- 
luation unit adopt 0 this as a fundamental operating principle, and moreover that 

•the selectlonvpf the topic for evaluation f, should be done on the basis of con- 
sent between or at least compromised among the two teams'", tfie two teams being . 
the evaluation unit and tha program developtnerit team. The Kugelmass discussion 
of the fieform- Junior High School change t/istituted by/the Israeli Knesset 
makes. a similar point. In 1968, the law altered the'then conventional 8 year 

^primary school and 4 year high school program into a'new ft- 3- 3 .sequence in 
government schools". Kugelmass suggests that there was a great deal of diffi- 
culty. ,in meetings among the administration of the Ministry of Education and 
researchers, and stresses that maiTlng a. decision about what sort, of evaluation 
to do, under differing pres^^s from diverse Interest groups* 1s not Simple* 
gr eas y. Da n Da vis too makes' the point. But he emphasizes that once a pre- 
ference is made, explicit about the desirablllty^outcome evaluation at least, 
the evaluation must be under considerable control by the evaluator to assure 

a reasonably successful evaluation: * 

* • \ . ' ■ . . 

The problem "here is not new, of course.- It underlie? any attempt to bullcf 

a coherent rese^rch^and development agenda, any etfort to choose, among products 

/or manufacture or"corporat1ons for acquisition, *N1ck- Smithes /labelling of 
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the process of trying to choose among evaluation enterprises as portfolio * 
development 1s also apt/ (The label may assist 1n translating Ideas about, 
evaluation to a nontechnical audience," such as a legislature, that 1s some- 
times more sensitive to business than to the evidential basis for^government) . . 

Nor 1s the difficulty of choosing what sort of evaluation to' perform con- 
fined to Israeli borders. The U.S. encounter with the same problem ,1s' one. 
of the reasons fW making clear the choice and the basis fof^hoice in the 
Holtzman Report's reconmendatlons. The pertinent evidence comes from a va- 1 
s rlety of- sources. Mary Kennedy (1980) for Instance suggest that efforts ^to 
compare the relatlve'effectlveness of two c^r more strategies of (say) Instruc- 
tion are not coirmon *t the school r district level. Her message is that'impact 
assessment 1$~less frequent and less Important than othar evaluation questions 
In the local agencies. Charles Stalford (1980) quite properly warns against ■ ' 
a M testirtg-only model of evaluation" (p. 6). "The Holtzman Project and other 
work seems to support th1» contention: activities other 'than Impact estima- 
tion are Important and the importance varies with the level of government and 
with th^ agency wlthjn government/ " * ' > 

This of, course does not mean that comparisons are unimportant, jnerel^ « ^ ! 
that they car, receive low priority. The reasons may^ Include simple inability^ 
to. create Variations that are cheop.er and more pro^ucfcive than the existing 
one and that are worth testing. For instance, response^ttf suggestion, 
that the Agency for International Develppfitent test variations^ one AID staffer 
complained 'plaintively that they had had enough! trobule creating one variation 
and that creatl ng more* just to be able to' find the most effective oqe was too 
•onerous to countenance. a -° " ~ *• 

■The more general implication 1$ that'wlthln a school district or at any 
other levpl of government a dc facto portfolio of evaluation activities. 1s 
created. The question this ^engenders 1s how^sfcch*a partfollo con or should < 
be built. Consider for Instance'. the first factor" that nrigfit be ,taken into 
account when developing a portfolio: the source of the Inquiry or target' 
audience for e v aluation 're sults^ Is it sufficient, to rely so1e ly-on eval ua- ; 
tlve questions from Instructors* parents* or program managers to develop a 
portfolio? Probably not, since asking the right questions requires some - 
s£111 and informed conceptions of evidence. Can one rely solely on the eva- 
luator? Probably not* since this assumes too much knowledge of substantive 
problems. What do the stereotypical portfolios 1oofc Mta 1n *th1s respect? 
The U.S. General Accounting -Office Initiates about two-thirds of Its own - 
Inquiries and most of these are managerial studies. Should evaluation offices 
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♦at the state level build a* portfol 1o 1ft the same way?,, We know very. Tittle' , 
about this. Nor can we give much advtce. * •' Af* 

'The second factor 1s time, know that* fast. turnaround stupes* are 
essential to* satisfy a public or a superior with a sh(5rt attention span, • 
if hot to actually resolve durable prpbleijvs*. And so* perhaps' most evaluative 
s.tudi£s need 'to be short in the 1nterest_of evaluatpr 'survival. But all eva- 
* Juators, especially those in government and academe, do have some responsjbi- * 
lity for finding long-term effects^ programs, and for understanding long- 
term social problems as well. ThefreNs no technology for designing evalua- v 
^ tions which produce short/ interim, and long-term results. 

the strategy has tyeen to e-1 1 cl t suggestions for 5 evaluations from directors 
of substantive programs. JhV ultimate choi'ce is based partly on agreement 
,. between evaluators and these agencies 1n principle. Bat 1t may be superseded 
T*kk# agreements^with the Secretary of Education or by . other criteria;' used 1n 
° ^making decisions. The other criteria include expiration, dates for leglsla- 
tion bearing'ofopj-ograms, the peri&d during which a legislative conmlttee 
could be expected. to use Information or whether Mgh priority, programs have" 
been evaluated. The choices ar* incorporated 1nt6 three plans of evaluation/ 

- Very little Intellectual attention fias been dedicated to a third factor, 
the. administrative mechanism whicJi yields tfo* evaluation portfolio, and which 
caVlbe'usid to terminate* projects which are not turning out well; The system 
at the U.jS. General. Accounting Office appears* to~.be hierarchical , ^Involving' 
^screening committees to, ultimately determine. the choice of project,, and a 
special conmlttee for termination. .Nomination of topics to be* v, 1nvest1§ated 
come fronrstaff groups w1 ^-operating responslbl lity. w Until recently,, the 
Office af Education land Dissemination at the' Office 9f Education had a similar 
. % system. But it is not' clear^how termination decisions were made. 

The main point is that criteria for developing and assessing the.valuf of 
an, evaluation portfolio are not yet clear.' Serioufc attention has been given 
the matter by bureaucrats, not J>y exec*Jt1ves,or academics, and their effort 
: ' ca nJtxLa ugmented »p ro f 1 1 a b 1 y b y other s. . _.j , ' ' 

4. ^VALUATOR CAPABILITIES 

The Project staff had been, asked to investigate the capabilities of those 
- who do "evaluations and .to* make reconmendations based on our findings. ,We * 
yrecomnehded that the Congress and tbejtepartment: 1) assess capabilities of 
local and state "education staff before new statutory evaluation requirement* 
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aVe directed ^at them 1n order to determine where resources are adequate to 
meet the demand, 2) expand training or technical assistance when the demands 
are notabi.e and capabilities low/and 3) explore the feasibility and desir- 
ability of direct contract programs to capitalize on capabilities .in strong 
ocal and state education agencies. „ 

The first section of the recommendation stems partly from the fact that 
no real standard for assigning the title ""eyaluator" exists and that Skills 
required of the evaluator depend heavily on the nature of the evaluation 
demand ^nd on local and state interest in evaluation. The. second part is 
based on the finding that mbst local and state agencies ne£d assistance 
when the evaluation requirement^ are technical. 7he s minority oV these agen-» 
cies that do havjl strong evaluation units are a major resource, and-we believe 
that direct grant opportunities should be, expanded to capitalize on them. 

By determining capabilities here we mean understanding whether, there can 
be a reasonable match between what the law demands, of Ideal and state eva- 
luators and the skij^of these. Individuals. A formal assessment of this 
sort fs. unlikely to be easy for three reasons.' <hrst, Within ,a school 
district or state office, evaluation responsibility may be Jpllt up among 
^several individuals, none ..of/ whom may hVre any pertinent training, and' this 
responsibility can oft<?n\ change. Second,\ evaluation duties may ^ave a con-, 
$iderable range depending on local intereSt^ln exploiting systematic Infor- 
mation to Improve programs. Just meeting minimum federal requirements 
requires far different resources than establishing a^ long-term research 
program. Finally, the methods one might exploit Jto perform capabilities' 
assessments are not clear. They. tango from intensive task analyses during, 
say,' pilot tests df\new regulations that require,* a specific type of evalua-. 
tion to- telephone surveys that enumerate skills and tasks. Drs. Georglne 
Pion and David Cordray are developing plans! now^to acconrmodateUuch problems 
.and to Implement assessments for IocqI education ayenciefc and for coimnunlty 
mental health centers. ' , , ^ ][ ''f v 

A critical influence on the. matter is whether an educatiotf/Agency decides 
to just acconwodate federal requirements 'or goes beyond* these 'io mount a* 
stronger evaluation program. Even just meitlflg requirements demands some 
skill. The notion of temporal Instability or reliability of tests *1s not 
obvious to many people despite training 1n a substantive education area and 
In the history of testing. As a consequence, we urged that the federal pro- 
gram sponsor make an effort to capitalize* on eval&ation unit expertise. In 
•training local or state staff 1n meeting* demands, and that funds be allocated, 

• • ■ ■* ' ',. . , . ' • - 

^4 , 
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for such training". Optics such as cooperative arrangements among ^small 
education agencies 'foP joint support ofan evaluation unit and eiwision 
- of federally supported technical. assistance centers need to be e^fot^/ \ 
and the Report suggests doing so. . " 

pfcr /local arid stfttejgencies that are willing to go beyond federal re- 
£ quirements, \*e stressed direct grants from. the federal government for two 
purposes. First, some agencie's are capable of mountings research and eva- 
luation programs that match fe'deral efforts 1n quality and are^more perti- 
nent to local interests. They are in the minority, accounting for probably 
no more than 400 of the 15,000 school districts *nd less than half the <J 
state agencies f- and they deserve to be gfven an opportunity to produce good ■ 
work t&a-t can be applied to other areas. /The secon^J purpose 1s to foster 
closer "fies between locaV'agencies and university evaluation groups. Such 
arrangements are bound to be difficult, but 1t is hard to see how the state 



of the art in evaluation can be advanced without better ties between the 
two. \ 
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There, are several point^of correspondence between the findings on which 
these reconmendations are / based and the bpinions registered 1n t the seminar 
papers. Consider, foryexample^ Lewy's conclusion that "a realistic assess- 
ment of actual need in terms of manpower and other resources and their satis- 
factory provision constitute a prerequisite for the successful operation of m 
the evaluation unit.. It Is'not the absolute \i2e*of the budget which deter- 
mines the successful operation... but rather the match between the resources 
available and appropriate definition of the evaluation tasks". I This is re** 
markably similar to tne conclusions tendered by. P ion and v others 1n the 
Holtzman Report that;- evaluation tasks vary widely among school districts,, 
that the skills required to do those tasks vary as well, and that th,e tasks 
have to be unde^tood before resources can be Intelligently allocated to 
training and before laws demanding wholesale evaluation can be conscientious- 
ly constructed. • \^ . — \^ * > \ »" 

The^second point of correspondence 11 <*s in Kugalnwss 1 observation that as 
a result of the academic emphasis on basic research and theory, the manpower 
available fbr task-oriented rtesearjch such as evaluation is spare. Tamlr points 
out that even where universitWtrained researchers are available, there will , 
be. a notably tension between the research-oriented view of *hat should be * o 
done at what level of accuracy, and what the manager or .practitioner believes 
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is warranted. ThAt same problem, has appeared in the United States and 1s 
being resolved in several .ways. A tight market in university jobs seems to 
have resulted in better people going into government, into independent con- 
tracting research Institutes, and into evaluation units at the local and 
state levsls. The. migration engenders problems but it 1s also reasonable 
to expectrbetter understanding of local practical problems and wiser evalua- 
tors. The federal government has assisted by creating technical assistance 
.-centers to respond solely to local needs for advice. Tfrfe centers are staffed 
*by unfversity trained people; not all local programs* however, are assisted 
#t .by such centers. 

A tWd point of correspondence is Lewy's conclusion that the Evaluation 
unit 'serves as a "catalyzer fO£ initiating evaluation activities, c the limits 
of which exceed the wording capacity of the unit itself" This suggests - 
that developers get interested 1n evaluation. to the extent that the working • 
relationship with evaluators is' clQSe and that this interest can be used to 
expand the effective size of the.evaluation unit's staff. There 1s; an analog 
here toleTforts^at the~ locarand~state levels V" the"U:S" ° to; augment the 
evaluation requirements' set out by the federal government. In particular, 
11 though only a minority of the evaluation units and research w]ts. within- 
school districts are strong, this minority used the minimal .(•eqCilrements^as 
a vehicle for colT'ecting, additional information of more direct relevance to ' 
local Interests? - States sychas California and Massachusetts also have this 
utmtar'lan'perspectlve, building, on federal .investments and requirements tp 
do a better job 1n meeting federal ,\ state', and local demands. 

One feature of the Israeli experience for which there 1s no" routine analog 
1s the uS4»\)f a liaison person to link the program development group with an 
evaluation £i pup. Informal arrangements of the sort do 'appear 1n the°U.S., 
L but the role seems jfluch better articulated in "the ICC. Lewy's description , 
of the liaison person's trafnlng and skill 1s especially interesting. If I 
, understand -1t correctly, the'three types include those with" evaluation train- 
~ ing of a substantial 'sort, thosre with* substantial substantive training,, and 
thp project dire^tor^ 'Lewy suggests that: the project director does not work' 
\ out too well because he hasn't *got the time, and that the substantive area I 
\xpert 1s probably best because he 1s limiersed In the project Itself. . A 

5. DESIGN, AND EXECUTION OF OUTCOME EVALUATIONS 

Once sild, it is 4bv1ous that quality 1n design of an outcome evaluation < , 
affects quaHty of 3 the data arfti>of conclusions. .-The evidence that bad design 
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can make programs look worse than they are, or better than they are,, or < 
yield ambiguous evidence 1 substantial. The theory that organizes under- 
standing of biases in estimating "program effects is,, however, reasonably 
well articulated. » ,. 3 

The idea that quality in design ought to be recognized as a formal part 
of evaluation policy is explicit in federal education agency attempts to 
yoke the introduction of Tie* -programs with design, as in evaluation of the 
proqrams supported under the Emergency School Assistance Act, in' attempts 
to review designs with more vigor within agencies, and in efforts to.provide, 
technical assistance programs in the interest of better local design. It / 
appears also in the U.SAGeneral Accounting Office's attention to competing 
^explanations characteristic of poor designs, to the elements of reasonable • 
design, and to 'the need for designing evaluations before a new program is . 
put into the field. It has been recog^d zed by the courts in cases outside 
education which recognize the flaws in some evaluation designs and the bene- 
fits of others. The Supreme Court's Federal Judicial Center, for example, 
is-develo'pigg poTtc'y on the use of randomized field experiments in legal 
settings to- make clear, the issues and precedents. The theme of quality, 
.though, is not sufficiently well established to flourish without periodic 
1 'reiteration/ -The task was undertaken, in the Holtzman Report though two . 

recommendations, one made to the Congress, and one to the Department of Edu- 
' cation. A .third, concerping standards, is treated later. 

Pilot testa and designs - , 

We recommended that the Congress: X) routinely consider pilot testing 
every major new program, major variations on existing programs, and major 
program components before they are adopted at the national, level,, using 
high quality e&luatibn designs, and Z) authorize the Secretary explicitly, 
in each statute that requires estimates of the program's effects on target 
individuals, to use high qual Ity 'designs.' especially randomized .field expe- 
— Hfliefrts-r-for-plaflMflg-and-evaluating new program' components, program varia- 
tions, and new programs. 

: The rationale for the first, part of this recommendation is that higher- 
quality evaluations „are more feasible before the program 1s adopted at the 
national level. /Political-institutional cons^ralnts^are likely to be less 
severe, better Resigns can be employed, and conclusions- then are likely to 
. be less ambiguous .-.The introductlon of -new-programs-can-be- -stagBd-so-that-- 
' earlier stages are pilot tests for later ones. We stress formal tests of 
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new program components and new variations Jiere 4 because such evaluations are 
not a matier of coniron practice. \ „ \ 

The second part of the recommendation, 'as^well a,s the first, stems from 
our conclusion that better designs must be'used,if the Congress br^the De- 
partment wants good estimates of the effects of programs on children. He 
do not advocate estimating those effects in all> cases . ^Estimation 1"s compli- 
cated under the best of cbndltions, despite simplistic announcements that the^ 
"program was successful because tes.t scores went" up M or that 1t was unsuccess- 
ful because they weit down. Nor do we believe that designs that are high 
quality relative toj statistical standards are always feasible, or warranted 
for estimating program effects/ We do advocate explicit authority 1n statutes 
for high quality designs, especially randomized experiments, to facilitate 
their . use/ Jte believe explicit statutory provision is Essential because such 
'designs are the^best 1n principle, and that' should be recognized. The autho- 
rization should provide for review of the use of these designs. _ j 

Tests of new program components, program variations* and new programs- 

He recomrended that the Department of Education explicitly authorlze^the : 
use of high quality evaluation designs-, especially randomlzejUexperlments, 
1n evaluating new progrom componejits^new'prdgram variations, and new programs 
1n all regulations^that requ1r(Test1mat1ng the effects of Innovative changes. 

The main justification 1s that high quality designs lead to less debatable 
estimates of programs on children than do low quality designs. They are less 
difficult to execute and are more feasible for pilot testing new programs, 
program variations, and- program components, than for estimating the effects 
of ongoing programs. Explicit authorization would make the Importance of . 
good designs plain, and would provide a more clear opportunity for competent 
state education authorities (SEAs) and local education authorities (LEAs) to 
exploit them. Her use the word "authorlfcV here rather than "require" to make 
clear that the evaluator is empowered to use an experimental design but need 
not do so if 1t is unwarranted or not feasible* ^ — L 

Remarko > 

• • — 

The recomnendatlons on randomized 'field tests were supported by some ev1- v 
y dence on their feasibility ancf appropriateness. A judgement-About feasibi- 
lity 1n the particular case, we believe, should be basedfon precedent, fo*» 
* number of field experiments have been undertaken 1n education and other 



15 



\ 



14 



BORUCH 



areas. It should be basedr for complex evaluations, on pilot tests of the 
experimental procedure itself, for one cannot anticipate all problems engen- 
dered by the method. And it should be based on independent criteria such 
as whether the service is in short supply^and randomized assignment is indeed 
an equitable method of allocating it. These criteria n^ed to be explicated 
better, and they need to be linked to broader testing strategy. 

Standardized Evaluation . In doing both, we can rely partly on Davis* 
presentation. He proposes that six conditions must be met in order to 
obtain detent estimates of 'the effects of firbgrams. 1) The program has to 
have relatively 'clear goals and operating procedures, that is, it must be 
implementable. 2) The evaluator must be responsible for . both the program 
operation and its evaluation, maintaining special controlfover evaluation. 
*3) (he program must be implemented fi rst in an optimal setting - field 
conditions, training, and the like being the best possible. 4) Schools must 
be selected for their willingness to participate 1n the research>5) The 
research design must approximate laboratpry models in terms of assignment 
and execution of evaluation. 6) The results of the evallijrtion must serve as * 
a standard against which normal field operations can be judged.' 

Not content to just lay out conditions', Davis 1s attempting field trials 
under these conditions on a program that Jias never been Investigate? well , 
despite Its attractiveness, in the U.S. or elsewhere. The^program 1s, as 
I understand, a national tutoring effort 1n which unlversltyrstudents get 
academic credit for helping children 1n grades five through nine, 
regular program has minimal supervislon.and the more elaborate M dp 
version involves intensive supervision ahd more hours of) tutoring, 
more Intensive version 1s distinctive In that "the program 1s a supervisory 
and guidance structure which 1s sensitive to the.problemsencountered by 
the tutors and can help in. solving them, and the program 1s maximally flexible 
so that 1t can adapt to the specific conditions of each tutor-tutee relation- 
ship". Earlier evaluations of the ongoing program sho^ mixed results. Con- 
sequently, a good field test of the ideal version of the program 1s a natural 

way to understand what the maxlumum effects of tutoring can be. 

- , ~ . 

Apart from the conditions that Davis proposes, hl^ general strategy of 
conduct1ng\a controlled experiment of an optimal program to gauge the jnaxlmum 
effects of Subsequent or ongoing prpgrams is an attractive one. It 1s general 
izable to the LkS.» at least In principle, and does/not appear to have been 
Suggested before, at least not as^expHcltly. There have been related v 
suggestions however. For Instance, the Rlecken et/al . (1974) volume on 
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social experimentation stressed the idea that in field tests of programs, 
one ought to assess not only program levels that are slearly feasible, but 
aTsto some that are not practical at the national levrfl. The argument is 
based on the precise that "practical " programs are often weaker than we 
expect them to be and that high dosage (optimal) programs, though Impractical 
at one time, may be practical In the future. This is especially likely if 
one finds that the higher intensity does produce notable effects while low 
dosage "practical" programs have- no detectable effect at all. 

Testing Co mponents and Variations . No theory of evaluation demands that 
the effects of an entire program be estimated, knd few practitioners would 
regard sucl/an unqual 1fled demand as sensible. Yet professional vernacular, 
rhetoric, /and legal mandates foster the view that wholesale evaluation is 
warranted, distracting attention from the possibility of testing components < 
of prograp.' For example, one may. find that running fNgh quality te^ts of 
new parent education programs is not possible. But estimating the effect 
of alternative sources, of information, of various ways to present the Infor- 
mation or ways to prevent Ingenuous 7 use of Information, And so on, may be 
possible in small high quality experiments. The strategy of component- 
wise evaluation hfas been exploited in the U.S. evaluations of the Emergency 
School Assistance Act, 1n research which preceded the development of Sesame 
' Street, and elsewhere. Incorporated Jnto evaluation policy, the Idea broadens 
early options, and In the event of a major evaluation's failure, It 1s a 
device for assuring ^tfrat at least pieces of the program can be assayed 
properly. 

n * ■ . 

Analogs to this approach are not difficult to find In work at the Israel 
Curriculum Center, the High School Biology Project, and other projects. The 
process of 1dent1fy1ng-fflvfch components to evaluate seems to have been r 
routlnlzed best at the ICC, notably by exploiting the mapping sentence 
approach: Lewy's application suggests that focusing on the entity to be 
evaluated Is integral to the continuous evaluation strategy. The Israeli 
adoption of h1tfh schfcol biology curriculum (BSCS) programs developed 1n the 
U. S.._1 s jetit1nent_toa^-tAccQgdlng^to_Tam1r t tfe B$CS prog^m was »v/m^i e J 
In three versions. The blue version emphasized blo-chemlca! concepts and * 
Is relatively sophisticated. The green emphasized an ecological perspective. 
The yellow stressed a more conventional approach, but was somewhat more 
Interesting and adaptable to the Israeli perspective. The important point 
here Is that three variations of the program were developed. This seems an 
Inmlnently sensible Idea when there are major differences In perspective 
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about how something should be taught and major differences in teacher opi- 
nion about how if.can be taught. Moreover, the variations' relative effect- 
iveness are testable in principle. In practice, pieces of each are likfcly . 
to be evaluatable. 

Dan Davis's "standardized evaluation" is also consistent with this theme. 
To determine the maximum possible effect of an ongoing program which itself 
may be difficult to evaluate well, one may invent a very intensive variation 
on it, an optimal version, and submit this to very well-controlled tests. 
The idea can help to circumvent the chronic problems of estimating the 
effects of ongoing studies. * 

Finally, the plan being developed by Gershon Ben-Shakhar and Baruch Nevo 
forrunderstanding the effectiveness of matriculation tests, reflects some of 
the same spirit. Formal testing in a* part of a large, complex education 
system and its evaluation; .more- or, less independently of the rest of the 
system, is an idea worth exploring. The so-called Irish Study has had a 
distinctive advantage in this regard, since standardized testing is not 
^coiTmon in Ireland and evaluators could introduce it on a trial basis and 
estimate its effects on teachers and students using randomized experiments 
and other evaluation designs (Airasian, et al., 1978). 

} ,.. .'.'./. 

6. CRITIQUE AND SECONDARY ANALYSIS OF EVALUATION RESULTS 

o 0 * 

The evaluation design, its execution, and the skills -of 4 original inves- 
tigators are basic to the production of useful information. But they are 
not always si cient. The* pressure toward less than candid reporting is 
sometimes great, and it is not always clear that one can resist them. 
Egregious errors*are made, and corroborative or contrary evidence 1s Ignored 
for the time available, for analysis is not always adequate. The benign 
skepticism necessary for the 1n-house evaluation generates a reasonable* but 
parochial picture. A less benign, or at least more Impartial, outside 
analyst dould come up with different conclusions. Finally, data generated 
ip social program evaluations constitute? a national resource and should 
°be treated as such. The research can b^xpen^slve despite the production 
of data that ar„e useful in short-term decisions. It behooves the evoluatqr 
to le n how to exploit the Information repeatedly. 

* Partly for these reasons, we reconmcndcd that In statutory requirements 
for evaluation of ma^or programs, the Congress: 1) require an Independent, 
balanced, and competent critique of evaluation results that ara material 
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to policy decisions, 2) require critique of samples of evaluations submit- 
ted by LEtts and SEAs In response to legal requirements, and 3) require that 
statistical data produced by national evaluations.be made available for 
reanalysls.. ^ 

A complementary recommendation was made to the Department of Education: 
1) Incorporate Into procurement procedures and policy the requirement*that 
all statistical data produced in major program evaluations be documented 
and stored for reartalysis. 2) Create ah' administrative mechanism for deciding 
when simultaneous analysis by both the original evaljjator and an independent 
analyst 1s desirable and feasible, and a mechanism foT* executing simultaneous 
Independent analyses. 

The text of the Report made it plain that we^dld iiot mean adverse corrnientr 
ary in using the word "critique". The Idea Js to isk for reasoned judgements 
about whether conclusions drawn frotji the evaljpitlon are sensible and can 
Inform decisions.' The Irnnediate reason for the reconrmendatlon 1s that 4&ch 
criticism 1st not routine but Is essentlaV.to enhance the\credib1Hty of good 
evaluations^ to properly Identify poor evaluations as sucn\and to provide 
feedback to federal evaluation units, contractors, and grantees about the 
quality of their work.j.There Is no forn^l system. for the competent critique 
of evaluation reports produced by local and state education agencies 1n 
response to federal law, yet many such reports could benefit from consclen-; 
tlous review 

The elements of a system for critique and secondary analysis should in- 
clude: 1) an explicit Institution^ policy on the/ap1d disclosure of reports 
and access to the data underlying the reports, 2) a mechanism for Independent 1 
critique or secondary analysis where possible during an ^valuation, and where 
this 1s\not. possible, after tf -report* is submitted formally, and 3) guidelines 

on the reporting and storage of information. 

■ / ■ v 

The Report recognizes the problem that criticism may be witless and counter 
productive. The recomnendatlon Is based on the premise that the long-run 
benefits will offset 1$e effects of self Interested criticism and the burden 
that criticism Imposes on the evaluator. < 

• ■ \ • • 

Remarks \ ' 

No explicit attention to. tfcfc matter Is evident In \the seminar papers. 
Rather, the theme is implicit in the spirited exchanges of opinion during 
the seminar, and in custom 1f 1 judge correctly £he : participants 1 stress 



on presenting research results in\the sometimes harsh climate of professional 
forums. The High School Biology Project, fo/ Instance, appears to have gene- 
rated a l large number of fascinating papers that are Informative to research- 1 
ers inside <rnd outside Israel. As I understand It. the Office of the Chief 
Scientist too can°serve as a device for independent critique, and perhaps ^ 
secondary analysis as well. But the ordinary mission of the office, 1n 
advising and deciding on research projects^, can dilute administrative Inde- 
pendence. Similar offices in the United States, such as the Director of the 
research-oriented National Institute of Education, operate with a similar 
constraint: fiscal , administrative, and bureaucratic independence 1s a matter 
of degree. *. 

Reasonable, critique depends on. standards, and standards were addressed fn 
both the lioltznian Report and J n the seminar papers. They are considered in 

the next section : • ' ■ i 

, * • ^ • 

Analysis and Com peting -Mode Is. The. Holt^man Report did not examlhe tte ^ 
methods of data analysis used in evaluation. Its principle audience was 
nontechnical an^ most of the questions 1t addressed *t»re answered 1n a non- 
technical way. ,^it»er research produced by North ^stern, however, has 
examined technical issues 1n- assuring access to rind the quality of data for 
analysis and the nature of competing analyses that might be undertaken. The 
volume edited by Boruch, Hortman, and Cordray (1981), for Instance, consi- 
ders both policy and practice, and can be. regarded as an explication of the 
Holtzman Reports recommendations on secondary analysis, * 

Ita1 Zalc's paper 1s most pertinent to this level of detail. It represents 
a statistical tradition of trying to understand the structure underlying 
data, of establishing the extent' to which a theory represented mathematically 
is consonant with the information. The tradltlqfUs represented nest visibly 
ItTeconometrlcs, but recent attempts by Goldbergei^tJreskog Qltfj i others to 
-link approaches in psyehometrlcs ard econometrics through structural equation' 
models, and the quantitative sociologists' work on the latter have led to 
some remarkable advances in facilitating and undemanding their use. Several 
points bearing on the general topic seem worth making. They are implicit in 
Zak's approach. 

First, methods such as Hold's PLS and JBreskog'slnaxImum likelihood, 
approaches permit one to relax; the assumptions characteristic of conventional 
'textbook approaches such as regression analysis? Slmpfy put. they allow one 
to build more realistic models of reality. That this advantage 1s not trivial 
1s apparent from policy-relevant research 1n the U.S. For example, the 
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West1nghouse-0hio State* University evaluations of Head Start'; a preschool 
program for deprived children, resulted in estimates of program' eff^pt that 
were near zero and in some cas.es negative. At worst, ifesSeemed that the 
program hurt rather than helped in their conventional covariance analyses. 
Seco ndary _a_ na lyseSjja f the sam e data_, by MagidsoYi (1977), capitalized on 
structural equatiorr models similar in character to those that Ijil uses. The 
new' results, based on less demanding assumptions x then the original work, 
suggest that the program had positive though small effects on the cognitive 
Ability of children who participated. • 

The second point is that this benefit of new -methods also produces ambi- 
guity. Disparate models, thecfrtes of behavior if you will, may f ft the data, 
equally well/^So, for instance* tjie Magidson results are being debated by 
other analysts who^ believe their^tnodels are at least as appropiuate^TFTIs 
ambfgulty Is tyrlcai. /And consequently ft behooves the analyst to fit - ** 
, several models to his data, in much the same* way that Zak has done. 

Another polnfworth recognizing stems from the fact that these new model - 
fitting approaches have developed Independently of methods In conventional 
randomized experiments. ; The two .cultures^ here are different, but" failing 
to ^recognize their linkages wou1d\be a mistake. It 1s possible, for Instance 
' to express the ordinary analysis of the variance rratfel that- underlies random- 
Ized tests In terms of structural models. .Lee Hollns and I have done so for 
one class of models but got much other, work seems to "Have been done. Perhaps 
more Important, the structural models* lend themselves tQ internal ana lyse^*^ 
1n experiments. That 1s, having discovered that ^ program h<fs an overall 
effect, based orPrandomlzed design and conventional analysis* one may exploit 
the new methods 1n t path analysis to better understand links be>reen specific, 
components* of the program, specific types of participants, and specific but- 
come variables. Something of, the sort has been tried in an analysis of 
prison parole programs by Rossi ♦ Berk* and Lenlhan (198D). More distantly 
related approaches are not unconrnon 1n analyzing the results of field expe- 
riments on income subsidy programs for the poor 'see Boruch, Cordray. anc 
Wtfrtmani 1981) for other illustrations)., * 

. - .. 

7; STANDARDS AND GUIDELINES 

; - ' . 

One of the justifications for the Holtzman Project, was Congressional 
interest 1n whether evaluations could be subjected to uniform standards 
for judging their quality. - Indeed* a variety of guidelines, to judge quality 
have been developed by the U.S, General Accounting Office, the Evaluation 
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Research Society, and the Joint Committee on Standards for Program Evalua- 
tion, a group whose members Include representatives of most professional 
associations with an Interest 1n evaluation. Crude standards^are also 
embodied 1n certain federal activity^ notably* the Joint D1ssem1nat1on;and 
Review Panel, whose mission 1s to assess the evidence on locally developed 
programs 1n order to determine whether the programs merit federal support 0 
for distribution to other local agencies. There 1s substantial overlap in / 

x topical coverage among guidelines. But they differ 1n detail. Our.review I 
led to the following reconmendatlon : while recently developed standards and 
guidelines for evaluation should not be Incorporajted Into law, they are 
sufficiently well developed to recoiranend that the Congress: 1) use such j 

•guidelines to understand what can reasonably be expected of evaluations, / 
2) dtrect that agendes'use them as a $u1de where appropriate to developing 
criteria forejudging evali/atlon" plans submitted by local and statji agencies, 
and 3) elicit assistance 1n the Interpretation of guidelines from Congres- 
sional support agencies, such as GAO, that have been Instrumental 1n the/I r 
construction. ' .,• , j 

• The main reason for recognizing that guidelines be recognl2ed officially 
is that we believe they can be useful 1n explaining what 1s meant by eva- , 
luatlqn to the public andJts representatives, and 1n Informing the public 
about what can reasonably!* expected of evaluation projects. 'Guidelines 

. may also assist 1n protecting the competent evaluator from Incompetent 
criticism. They should certainly help one to Identify Inept evaluations. 

' * ' ■ - * // • 

• He argued against Incorporating such standards Into law because neither 

• evaluation law nor the standards are sufficiently well developed a^yet to. 
justify Incorporation! Ityreover, giving legal status to specific -guide- 
lines .can impede the development of better guidelines, a^e almost^certaln 
to be applied Inflexibly, and are 11ke1* to do more damage than good 1n 
other respects. % . . ' 

Remarks e IU \ 

- * // 

Most standards are general and their proper application depends on drr 
cumstance. "A survey of needs, for instance, requires a" subset of criteria 
1n a complete 11st; a randoml2ed experiment requires a few of the same 
criteria but others must be employed as well/* The Ho1 taw Report did not 
addrest-thU matter in detail. Instead, 1t fbcused on when quIde'Mnes 
-should be used, qptably 1n judging the merit of grant proposals fing contracts 
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that contain evaluation plans and, once the program Is evaluated, in judging* 
the quality' of the evaluative evidence. * , . i 

The ttc evidently takes *a somewhat different approach 1n ascribing the ' ; ; 

idea to "minimal evaluation requirements". As I understand Lewy's remarks, ' .' V 

the concern Ms not only on when evidence becomes material but on the general 
kind of evidence as well T Expert judgement Is regarded a3 esseritlal for r 7 
judging the. quality of materials, observation of the teaching an'd "learning 
processes are essential during early try-outs of the' material 1n classes, 
and assessing' cognitive aTthlevinents'of children Is essential e^t the end 
of "the first try-out. JheYlrst two activities are the responsibility of ^ 
the program developers and the last 1s- undertaken primarily by the evalua- . 
tdr. * • ; . . ♦ 

The ICC criteria are njaff Incompatible with other, more "elaborate guide- \ \ 
"Unas,.. They c^n^e regarded as a distinctive operat1onal\zat1on of 1tem% 
that appear In more general lists and, partly because of their brevity, are 
likely to be a useful operating rule 1n at least some large local education - 
* agencies *1n the U.S. And there are d1st1nct1ve^paj*al>eU to this minimalist 
approach 1n some U.S. evaluation agencies. Th%Qff1cer*of the Inspector. ? 
General 1n the Department Of Health and Human Resources, for example, under-, 
takes fast turnaround studies that rely heavily on export bureaucratic 
judgement and some on-site observation of processes In,' for example, health 
services delivery (Hendricks 1981). But Ms effort °1s dedicated to admlnljs-- 
-tratlve failures rather than to programs in general . There 1s .some simila- 
rity also, to recent Agency 4 , for International Development efforts to execute 
fast turnaround "studies of foreign asslstance'projects. Here, t as 1ri the * 
\lsrae11 case, there 1s often little time to do much more than obtain export - 
judgement and crude observations. 

Implicit In* both the Ho1t2man%Report*and 1n. Lewy's standards paper 1s ° 
the*1dea that guidelines for judging quality can range considerably, fronr . 
the very permissive to trie very general. The dearest Illustration of both 
. the Idea and* Its exploitation canes from the engineering sciences,. Here,,' 
the demand level of .the standard depends on the uses to which the Information 
1s put. Relatively' wide tolerances are permissible In local geophysical 
measurements since the Information 1s used primarily by lawyers and cons- / 
, tmtlon^englneers^Muchjcloser tolerances are required in geophysical 
measures-made for some scientific studies, on land erosion J!or Instance. 
^ The same spirit 1s evident 1n thQ ICC 1 s minimal 1st approach - expert judgement 
normally tmhq Jess precise and certainly J^ess, verifiable than elaborate 
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observation,- It Ys evident tdo but. un\terexp1o1ted In'the/U.S. It 1s sen- 
sffele to consider using One*" set of standards In 'special grants made for small 
innovative programs and a less restrictive set for proafaws/thjat ofllrate 
with regular budgets, if may be sensible to l*equire^largej% more capable 
school districts to provide information that accords with more rigorous 
standards and to require much less precise Information from the less capable t 
ones. The implication's 'have not been worked out nor have the options been < 
articulated well In" eval uatton management irf&the U S, or abroad. Both tasks 
should be undertaken,^ c-. 

8, THE USE OF EVALUATION RESULTS - ' /■ " * 6 

Whether the resul ts of evaluation -are used and by. whom they are ifced was 
a fundamental coricern of the.Project. He uncovered agreat deal .of informa- 
tion on use, but the research was difficult- Not the least of these was 
lexical confusion. A federal director of research, for Instance, announced 
tha'tVe did not perform evaluation at all despite n a 11st of projects under, 
'"his dlrection^hat i-z >d'. evaluations labelled as such. His superior, * 
interviewed fifteen m >s later, claimed the contrary: that the*dtv1s1on 
produced a great many evaluations all useful to foanagement. 4 tfe encountered 
a % Congressional staffer who announced , baldly at the beginning of an Interview 
that his conrnUtee did not use evaluations, J yet he later said that evaluation 

* reports were 4 used to guide conmlttee hearings oh 1 programs. / ■* 

this confusion, or at least Inconsistency* underlay a good deal of debate 
about the utility of evaluatlon'results. And so we defined "use" exp11c1tly a< 
|p mean: 1) applying these results in making specific decisions about law, . 
regulations, budgets, or re1ate^adm1n1strat1vc topics, and changes- 1n sub- 
stantive content -of programs, 2) cap1toli2lng on them to enhance understanding 
of Issues even where a decision could not be made, or 3) exploiting the Infor- 
mation to^eVsuade others, as 1n political speeches, or to confirm one Is own 
v beliefs. The Project's efforts to document use and nonuse of evaluation 
reports focused on spetlflc evaluation*^ and stressed the corroboration of 
evidence from different sources. The findings suggested, ai one might expect, 

* that some evaluations are used and some are not, .and that use- depends heavily 
on the planning of use, dose relations between the user and evaluatbr, and 
willingness and, capacity to use results. , % ; . * 

0ur^r^4-reconinendat1 on bearing on use of evaluation results was made to 

the Congress. o We urged that Its members: »- . 
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--r- J^,-^~--:i - •. - : f . : 

1) direct the staff of relevant committees, t;the Department, and., the 

to routinely outline whlch'institutions can 6 hea'sonably be exoected v- ^ 
• " to use results of each^ajor evaluation arid how such results flight ^ 
1 be used, during the design stage, 'of every major program evaluation; 

- specify exactly which evaluations have been used and why they were r "\ 

' used, which have not bee>n used and why they were not used, 1n aiitho- . . 

. ; ^ r'izatlons and appropriations coimilttee reports; . j .> . 

3) require evidence about specific changes resulting fiom evaluation, 
. , . whenever the law requires state agencies to- describe uses of evalud- - 
*t!lon; and , ; * - * ' . ' 

I 4") explore the feasibility of direct competitive grants and contract 
c programs focused^on 1mprov1ng\he use of resu,lts-at the' local and 

state education agency levels. A '\ ' • > * ' 

The origins of the first part of the recoirmendatlon He 1n the absence ^ v 

- of any mechanism forplanalng use.at the national Tevel. Simply put, 

unless specific user groups are Identified ar(d some decision options lyaid * 
but* evaluation results are less likely to be\ised** Indeed v 1f there 1s 
no clour pay to link the evaluation with decisions or .considerably better 

* .understanding, one can argue that'the evaluation should not.be performed 

• at fell.'. ^Specifying user£ and option^ wtt! alsotatelp'to fpake it easier to 
track utl'ljzatlon, and that; in turn, jw£11 help to Inform* judgements about- 
how evaluation 'resources" could be better allocated: The recdnptiendation to 

in cite useful and useless evaluations in federal reports \and to require SEAs ' * 

j^ond LEAs tp record specific changes ha5*the same objectives :' better upder* 
standing of use and better resource allocation. The suggestion to identify 
useless evaluation is not an invitation to criticize arbitrarily.; He found 
\thot some local and state education agencies are capable and interested : 1n 
Inventi/iO qnd testing better ways to .use informatlbh,. The Vuggestion to s . . 
\ expand their opportunities for doing so 1s based on lihis. x \ ^ c 

\ A second, related reconrandotlon was made to the ^epartnffint of Education. 
The Report urg(^d Wt evaluation unit staff and Evaluation cont racto r^be 



directed' to 1 V pmvide or^ roports^regularly as well as written^eports.on 
results of major evaluations, intf on ? the uses to which results cdn* be put, 
to relevant Congressional staff and support ogene* staff and the program 
^ staff within the Department; 2) create a system to periodically collect, 
synthesize* ^nd report specific uses to Which evaluation istfut; 3) improve 
the Annual Evaluation Report by citing instances of u$e more specifically; 
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and 4) dl rect eva 1 uatloFslaff " tb"n«(Bt nsgul arly with Congressional staff 
to clarify Information n%eds, feasibility of evaluation, audiences for 
results, andways 1n whkh results can bemused to modify programs." 

This clQster of suggestions 1s based, partly on the finding that the use 
of evaluation Results Is not tracked conscientiously and the belief that It s 
ought to-be tracked to learn how to perform evaluations better, and how to 
better allocate evaluation resources. The rational^ for the'last recomnen- 
datlon Is Identical to the one given earlier for the Congress on planning 
and executing evaluations. . % 

The final suggestion undef this topic focuses attention on assuring 
access to and'better specification- of reports. ' 

* m We : ^:omnended^ that thg -Department!) adhere to a clearance rule which 
makes evaluation reports automatically available after a fixed number of 
weeks; 2$ specify completely the evaluation^ documents referred to 1n the < 
Department's Annual £yal uatlon . Report , the Federal Register , ond policy 
statements; and 3) include, 1n every major eva luatl oh report, a fist of - s 
core recipients. ' ' " ' 

The recoimiendatlon stems partly from difficulties 'encountered, In* obtain- 
ing reports-under review by the Executive Secretartat^of the Dcpartment-of 
Health, Education, pnd Welfare and other groups involved 1n the DHEW 
clearance process. We also found It. difficult to Identify reports precisely 
when they jjere cited' as evidence r of the usefulness 5f evaluation In develop- 
ing regulations orpoUcy! The absence of;a 11st of cote recipients of 
reports mrfde It very difficult to Identify potential user groups and to 
determine If reports were used- The consequence 1s that .what 1s useless 
or useful Is less verifiable. 

R&mccrka " V * * . . :.'•*'* 

mm 

The basic Idea that there needs to be a group constituted to reason from 

the data 1s Implicit. In both our recommendations and 1n Israeli* operations. 
The form differs a bit, though. ' ^ ' / * ' 

Cogrolttees to Reason from Data . Consider, for Instance, Kugelmass* descrip- 
tion of the Van Leer -study of primary schools, a massive undertaking to 
understand the process and products of primary schools in Israel, including 
controversial Issues such' as integration and rellgloifc vs. non-rte11g1ous 
school systems. One distinctive aspect of this cntfe^Hse was that the 
Ministry of Education and Culture and the Chief Scientist took pains to 
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appoint a high level coiftilttee to understand the ways in which the results . 
of this research -could T>e^^d by the Ministry. The Chief Scientist, as 
well as heads of various divisions of* the Ministry and of the primary school 
system, werje Involved 1n the comnlttee, and a number of subrconfnlttees were 
set tip to recognize special Interests and problems. A mechanism of this 
sort was not suggested 1n,the Holtzman Report. But-1t does seem to be a 
sensible way to understand how policy Implications can be educed from such 
data. The Moitzman Report assumed that negotiation and regular coranunl ca- 
tions between the Congress and the Department would facilitate 1n understand- 
ing the uses to which the data could be put. The power that a blue ribbon 
conmlttee has to do ' this r .or to exploit suggestions made at lower levels 
could be used to make negotiation, more effective. 

Decision Options . The Holtzman Report marshalled a good deal of evidence 
on the use^of evaluations In decisions *c the federal level. But we obtained 
very little for local levels because of the difficulty of corroborating use. 
Nor did we classify. the uses according to what. decisions they might concern. 
Partly for this reason, the classification schemes developed by Lewy and 
Davis are pertinent. They are tidy way* to specify decision options. More- 
over, we can exploit them to explain our own -rccomnendatlon about making 
decision options c1car*bcforc evaluation 1s undertaken. 

For example, Lewy Identifies three decisions that might stem fftim an ■ 
evaluation done by an evaluation unit: The first 1s the selection of program 
components: what should be taught, whAt* materials might be Included 1n the 
teaching, and so on. "lie Is careful to point out that the program developer 
1s ultimately the one who must choose f rem among several alternatives examined 
In evaluation/ The second decision Is modifying a program. He suggests that 
1t "may turn out that some element such as jexerclses, Illustrations, or 
explanations contains certain flaws"* It 1s up to the evaluator to call 
attention to these. The third decision option has to do wlttv qualifying the 
use of the program. Here he stresses the "optimal" conditions under which 
the program might work or the minimal conditions for usage. This Includes, 
for example, whether the program w111 nor(c with little or no. training of 
teachers rather than with a good deal of training, whether equipment, space, 
and the like are available, and so on. 

This point 1s important for U.S. cvaluators. The Holtzman Report pointed 
out the Inability or unwillingness of various audiences to specify decision 
options before an evaluation 1s actually undertaken. Yet here Lewy Implies, 
* that the process 1s almost a matter of course for the Israelis. * More 1mpor~ 
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therefore less contro versial ^ 

""ZhlT^m'ti ..put. docUions t».t o^TW»r* •» «. "»•*-—— 
of p I d rduL ...lu..1.n plan and to or,. «... theso d.c,s«* 

™ l^.rKPlts of the evaluation. Ft Instance, the first possible result 
t o d.rdlted e,e.uetlOh ,*« an - - 

.Hrtl.1. effects for the optimal version of the program. The Implications 
0 Met one to, «..<. the ".ff.o.C of - P™.- » .« 

l-.nd M , recttend either dropping It or revising , e s . d 
„n«1ble result Is that the standardly evaluation result. In 4 large 

r e f t and .„ sudsedden. field eealue.l.ns she. • J£» 

results It 1e>.t this point that the program dlroctor. aocord ng to 0a», S . 

thint obou. anernatlue „.,s df 1.pr..m, tha f«,ld yersldn . 
pro,... The third case 1* that evaluation Pf the dpt«l version ' 
7. program .ff.c. and -.no of the generated eualuatlons resul In 
mod rd . fleet- . Her. Dau.s that «hd «IM>«~<> . -«e 

Of consideration and.cdst benefit analysis The fiddL 
-be M0hd.rdued.and ^« '^^TSZZ 

reduced. ' 

„... in.onnatldn . Tomlr's s t rot.„ df brief., Identifying 

Office of Evaluation's annual- report is composed. The uses he Identifies 
are Interesting. . 

for .»»... h. cites . 1974 popdr .being M atudpn. 

ZZZL lt.r1.le « o» of et.ti.tlc. .n Hol.gy. Anoth 
-■. f d » It student. -1n.,r10dUdre schools .<*«.«- pwl, on thl 
re on "the ndt.rl.1. probably U had no application. .g rl, 

. „ r . Action -e tdhon't. propdr. epocldl -dole, to -..It m 
Z Internet of agricultural students". S1.il1.r1y, it »os found Unit 
cut , doprl, d studonts ««. fill, cone frog, d.,.1opin 9 oountr... 
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achieved poorly and the consequent action was preparing a Hebrew version 
of the materia? of the test designed for the original course and designing 
supp1«mentar^Fodul^s^s^eW^s--ia^envi ce-teaicher—tcalaing- i „ 

Such Items are very persuasive. More importantly, a listing of the sort 
provided by Tamlr might serve as a good model of the sort of information 
which should be recognized in the university training of evaluatfors. It 
is the kind of information tha€ can be provided to program ^development staff 
to guicte them in making decisions about how to "reason from the data" and 
to senior executives and perhaps legislative staff to show how evaluation 
results have been used. And the mode! is- likely to be useful at the local 
level to Illustrate use at least and to foster Invention of other approaches 
at best. > 

Definition of Use . None- of the seminar papers put much stress on defining 
the use of evaluation r suits. But several recognize difficulties engendered 
by ambiguity In the word and a variety of definitions are Implicit. 01 ass » 
for instance* recognizes, as the Hoi tzman Report does, tha v t arguments about 
utilization may be gratuitous, citing the "usual almost ritual litany about 
the underut111zat1 on of research results". The Implicit definitions range 
> from examination of a report. Judging from the Kugelmass paper: "The very 
process of bringing/,, senior decision makers (through the Office of Chief 
Scientist) Into continued examination of the research... may be the most y 
Important pcoduct qf . the process of evaluations, and not necessarily specific 
decisions". > The fioltzman Report recognizes the same kind of use 1n defining v 
evaluation and exploits 1t 1n enumerating references made to evaluations 1n 
Hearings routinely Issued by Congressional CemntttfeeS' charged with authorizing 
funds for education programs. V 

Tamlr's Vlstlng 1s much more specific and Implies a different category 
of uses, also IdentlfledMn the Hortzman Report: specific decisions. The 
category 1s Important but was rather difficult to examine. To be sure, some 
evaluations, such as the National Institute of Education's Compensatory 
'Education Study are remarkable. 1n that evidence on uyfe 1s available from 
the public record, such is Hearings >' and can be corroborated wjrough Inter- 
vim tilth legislative staffers and bureaucrats. Mpreover, it waTT$lat1ve1y 
easy to trace ties between items 1n the Study and subsequent cfianges 1n 
federal regulation and administrative practice. Others are not as easy; . 

< 'V. 

Overreportlng of use 1s likely, for instance, 1f one talks only to the 
producers gf-a study. Underreporting 1s chronic partly because of memory 
t 1apse, the Absence of written accessible records on use, and the' time 1t may 
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take for a report to filter through several layers of bureaucracy. The 
centralization of Tamir's project may make tracking utilization easier than 

in-a-decentral-lzed-entecprlse^ 



Factors Influencing Utilization . For Blass, the most Important factors in- 
fluencing use are the political situation/the organizational structure of 
the society, personal attributes of the evaluator or of the decision maker, 
especially of the latter, the state of the art 1n the scientific discipline 
and the character of the issues. 

The Hoi tzman Report did not frame the Influences on evaluation this way 
But the conceptualization seems natural fo r Isr aeli operations and for. some 
U.S. activities. My understanding 1s thateyaluation 1s more centralized / 
in Israel, 1^ the organization of the ICC, the High School Biology Project, 
and the Office of the Chief Scientist are any Index. This stands 1n con-/ 
trast to many U.S. efforts 1n that evaluation. responsibilities are dispensed 
across levels of government - local, state, and federal - and agencies within 
levels of •government. Judging from the Kugelmass paper, the Office of the 
Chief Scientist can relate directly to the political concerns of the Mlrjlstry 
of Education. In the U.S.', the several layers of bureaucracy between the 
Secretary of Education and the Evaluation unit within the Department, for 
good or ill,- probably affect the political attenti veness of the latter and 
the receptivity of the former; ' I . 

Centralization does appear at tines in special studies undertaken in the 
U.S. For example, the NIE Compensatory Education Study was created ^ law 
to examine, among other things, how s federal funded programs for poor primary 
school children fared. It was centralized 1n that a team approved by 
Congressional staff was created to be answerable to the Congress "alone. It 
was deliberately sensitive to political Issues and acconrodated thetij' through 
contlnuous'negotlatlon between the evaluation groups' and Congressional Interest 
groups. The personal attributes of the group leader, Paul Mill • including 
his- prior experience as a bureaucrat and Congressional staff number/, seem to 
have been very Influential 1n producing useful results. j 

There are. of course, ways'to characterize factors that Influence the ufce 
of evaluation other than the one that Blass proposed. lA some cates. thpy 
may be more productive. Consider, for example, that each of the seminar 
papers stresses the richness and the diversity of the process by which eva- 
luation results are used or not used. They are important 1n th1^' respect 
but. for the moment, let me propose a model which may help to understands 
better the order that underlies the diversity. 
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The following questions Imply distinct events that determine whether eva- 
luation results are used: 

.Does the prospective user know about the evaluation results? 

. If results are known, are they understood? "~ ~ 7" 

. If understood, are they believed? 

. If believed, does the user hove the ability and willingness to use them? 
When all are answered~1n the affirmative, then results are likely to be used. 
But the first negative will reduce any possibility of use, Especially if the 
potential user is the same at e$ch step. 

This way of describing the utilization process 1s useful by itself. For 
example, it suggests that simple probabilistic models may be helpful in 
understanding why some research on'utHizatlon is misleading, and how one 
might enhance utilization. The simplest such model posits that each event 
is independent and a probability 1s attached to each question's resulting in 
a Yes. L/r the' probability for each is on& 1/2, say, then the overall proba-. 
bill ty of a "use" occurring Is (1/2) 4 * 1/16. If, as I suspect, £he odds are 
. lower on each Yes, say 1/4, then the probability of a user's knowing about 
results is 1/4, the probability of the user's understanding them 1s l/4,„and 
so on; the'overall odds against results being used are 255 to 1. Not very 
promising. " 

Other models though are more realistic. If, for example, we suppose" that— 
the evaluation process 1s centralized so that the prospective user 1s also 
the producer of "the Information, then the probability of being willing to 
use the Information 1fi conditional on the prior events and the ^Vlkell hood of 
use 1s' closer to 100%. Indeed, the conditional model 1s a numerical represen- 
tation of what happens wrten a liaison person 1s used 1n ICC evaluations In 
. Israel and when a brokerage system 1s used to translate f mdlngs, Into usable 
results as Cooley (19C0) did 1n the Pittsburgh school district. 
■ The model does suggest that we coller.tniata at each stage, to properly 
estimate odds on ultimate use. of evaluation results. But little work appears 
to have been done on this problem In the U.S. Most research fleals with the 
question: What does the process of use look like In particular case studies? 
It seems to me that the case -studies are important to Inform such models, as 
well as being Important as. qualitative descriptors of the process. But they 
require large samples of events across sites or-of-mult1ple_events within 
site. to obtain decent estimates of parameters In the model. 
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Notes 

1 The research on which tn1s paper 1s» b'a/ed was supported by the Division 
of Evaluation of the U.S. Department of Education and the Notlonol Instl- 

— tute-of^ducatioa^: The discussion of Israeli work here Is based; on mate- 
rial presented at the Israe1-US seminars on evaluation In June J980. The 
material on U.S. policy adopts heavily from Boruch and Cordray (1980) and 
other documents cited in the text. 

2 See Verma's (1977) discussion of research on medicine 1n medieval India 
and Rabinovitch's (1973) fine description of rabbinic thought on evidence, 
during the ninth through twelth centuries. The 'Syrian experiment is. des- 
cribed by. American researcher F.S. Chaptn (1947); the English and Scottish 
tests are described in Cochran (1976) among others. 

J Two other efforts, independent of these, are worth examining because their 
conclusions differ at times from ours. The first, undertaken by R a n A^&> 
Corporation staff, appears in a monograph. by Plncus (Ed.). The secorjjKgp 
prepared by members of Stanford's~Ccmsort1um on Evaluation Research,|MF 
presented in a volume by Cronbath and others (19B0). e 
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